Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878

Merged
merged 15 commits into from
Feb 1, 2025

Conversation

julialeex
Copy link
Contributor

@julialeex julialeex commented Jan 30, 2025

Problem (why)
Currently the daily cost and token usage on models is very high. We want to find some ways to reduce them.

Solution (context)
Prompt caching can significantly reduce token costs. Each cache hit reduces costs by 90%, while each cache miss increases costs by 25%.

After some initial analysis, we decide to start with implementing prompt caching for Claude models.

Implementation (what)

  • Adding the header cache_control: ephemeral, which creates a cache with a 5 min TTL.
  • Server Side Implementation in this PR

Anthropic Docs (context)

Test plan

  • Tested locally and cache is being added

@julialeex julialeex requested a review from olafurpg January 30, 2025 00:18
lib/shared/src/chat/transcript/messages.ts Outdated Show resolved Hide resolved
lib/shared/src/sourcegraph-api/completions/types.ts Outdated Show resolved Hide resolved
vscode/src/edit/prompt/context.ts Outdated Show resolved Hide resolved
@julialeex julialeex requested a review from olafurpg January 31, 2025 02:29
Copy link
Member

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking comments are 1) undo snapshot/recording changes by not enabling prompt caching in agent integration tests and 2) coming up with a strategy to measure the effectiveness of prompt caching.

Currently, in the diff, prompt caching is unconditionally enabled everywhere and we don't track the impact with telemetry events. My expectation is that we should be able to enable/disable this feature dynamically with feature flags and we should be able to answer questions like how much faster responses are with prompt caching (or what % of tokens are cached).

vscode/src/prompt-builder/utils.ts Outdated Show resolved Hide resolved
vscode/src/edit/prompt/context.ts Outdated Show resolved Hide resolved
@@ -327,3 +332,12 @@ export class ClientConfigSingleton {
return this.fetchConfigEndpoint(signal, config)
}
}
// It's really complicated to access CodyClientConfig from functions like utils.ts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth getting team alignment on what the singleton story should look like for CodyClientConfig. Currently, the interface is designed in a very poor way where it includes information from the server AND custom local settings. I think a cleaner solution is to separate the /.api/client-config settings from locally inferred settings. Going forward, we should stop using the site version as a feature gating signal and exclusively use /.api/client-config as you have done with api-version=7 in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking comment, just want to call out that I'm not 100% happy with how the current code is organized

Copy link
Contributor Author

@julialeex julialeex Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Can revisit this problem later.

agent/src/custom-commands.test.ts Outdated Show resolved Hide resolved
agent/src/__snapshots__/custom-commands.test.ts.snap Outdated Show resolved Hide resolved
@olafurpg
Copy link
Member

I think we disable feature flags by default in agent integration tests, so if you gate caching with a feature flag, then the existing snapshot tests should pass without updating the recordings.

Copy link
Member

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping since I will be afk until Tuesday and we are targeting this for Wednesday. I recommend getting another approval on the PR once you have addressed the comments above.

@julialeex julialeex changed the title feat/cody: Add Prompt Caching to Code Context (CODY-4807) feat(context): Add Prompt Caching to Code Context (CODY-4807) Jan 31, 2025
@julialeex julialeex merged commit f32989a into main Feb 1, 2025
21 checks passed
@julialeex julialeex deleted the jlxu/cache_control branch February 1, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants